Design rules for the self-assembly of a protein crystal 
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Theories of protein crystaUization based on spheres that form close-packed crystals predict opti- 
mal assembly within a 'slot' of second virial coefficients and enhanced assembly near the metastable 
liquid- vapor critical point. However, most protein crystals are open structures stabilized by 
anisotropic interactions. Here, we use theory and simulation to show that assembly of one such 
structure is not predicted by the second virial coefficient or enhanced by the critical point. Instead, 
good assembly requires that the thermodynamic driving force be on the order of the thermal en- 
ergy and that interactions be made as nonspecific as possible without promoting liquid- vapor phase 
separation. 



The need to crystallize proteins for X-ray studies has 
spurred the development of theories of protein crystal- 
lization. These theories are largely based on the be- 
havior of spheres with short-range isotropic attractions, 
a representation motivated by two observations. First, 
phase diagrams for typical proteins and spherical col- 
loids with short range attractions are structurally simi- 
lar, possessing a metastable demixing transition between 
a vapor of solute (solute-poor solution) and a liquid of 
solute (solute-rich solution) pil4j. Second, both proteins 
and spherical colloids tend to crystallize when the second 
virial coefficient, an orient ationally- averaged measure of 
protein-protein attraction, lies in a defined 'crystalliza- 
tion slot' [5Hl]. On the computer, short-range isotropic 
spheres crystallize poorly above the metastable liquid- 
vapor binodal and show enhanced nucleation rates near 
or below it O [8HT2] . Such enhancement is indeed seen 
in some protein solutions pISHTS] . However, other exper- 
iments show disparities with this picture. Proteins can 
crystallize readily above the binodal [161 Ull and expe- 
rience kinetically-impaired crystallization below it [18]. 
They can also lie in the crystallization slot and not crys- 
tallize [19 . In addition, although the structure of protein 
and colloid phase diagrams is similar, the microscopic na- 
ture of the stable solid is not: most proteins do not form 
close-packed crystals [20 . 

These disparities motivate a theoretical approach 
to protein crystallization that acknowledges additional 
features of proteins' interactions, particularly their 
anisotropy |2TH26] . Such studies suggest that rules for 
optimal assembly of open structures are different from 
the rules for optimal assembly of close-packed structures. 
Here we explicitly demonstrate this difference. We have 
used extensive equilibrium and nonequilibrium numer- 
ical simulations and quantitatively accurate mean-field 
theory to exhaustively determine the design rules for 
optimal assembly of a model patterned after the SbpA 
surface-layer protein. The latter forms a porous square 
lattice with a tetrameric repeat unit on the surface of 
the bacterium Lysinibacillus sphaericus^ and in vitro on 
surfaces or in solution [27H3Q] . We impose a simple 
set of model protein interactions that stabilize the two 



condensed phases seen in experiments: specific inter- 
actions to stabilize the open crystal structure [31 and 
nonspecific interactions to stabilize unstructured aggre- 
gates observed on lipid bilayers [30 . A similar distinction 
between orientationally specific and nonspecific interac- 
tions has been considered in models of polymer crystal- 
lization i32j. Such a model is crucially different from 
isotropic models in that the same microscopic interac- 
tion does not stabilize both crystal and liquid phases. In- 
stead, specific and nonspecific interactions independently 
drive distinct critical behaviors [32^ 33 . Consequently, 
we find that design rules for assembly differ from those of 
spheres. While large density fluctuations promote crys- 
tallization of close-packed spheres [3 , they tend to inhibit 
the symmetry fluctuations required to achieve assembly 
of the open surface-layer lattice. Further, we find that 
the second virial coefficient B2 bounds good assembly 
but does not predict it. It is intuitively reasonable that 
B2 should bound assembly: too strong an attraction re- 
sults in kinetic trapping, while too weak an attraction 
suppresses crystal nucleation. However, its orientational 
averaging renders it blind to the microscopic origin of 
the attraction: model proteins with identical values of 
B2 but different combinations of specific and nonspecific 
interactions can assemble well, poorly, or not at all. 

Instead, we find that good assembly can be predicted 
by a combination of two design rules: the thermodynamic 
driving force for crystallization (defined as the free en- 
ergy difference between the gas and the crystal) must be 
1 — 2 /cbT, and interactions should be made as nonspecific 
as possible without promoting liquid-vapor phase sepa- 
ration. In experimental terms, our results suggest ad- 
justing solution conditions in order to impose a defined 
supersaturation at the liquid-vapor binodal. While crys- 
tallization can happen at or below the binodal, we find 
that such large-scale nonspecific association usually leads 
to slow dynamics and poor yield. Taken in the context of 
recent simulation work |34H36] , our findings suggest that 
the rules governing the assembly of protein crystals are 
in important ways more like those governing viral capsid 
self-assembly than those underpinning the crystallization 
of spherical colloids. 
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FIG. 1: (a) Monomer geometry, (b) Square lattice stabilized 
by the two chemically specific interactions: internal bonds 
(filled circles) and external bonds (open circles). 

Model and methods. We consider a generalization 
of the model SbpA surface-layer protein introduced in 
Ref. [24]. Hard rectangular monomers of width a (= 3.9 
nm) and length la {I = 2.2) live on a smooth, two- 
dimensional substrate. Monomer interactions acknowl- 
edge the tendency of SbpA proteins to form both un- 
structured aggregates [30 and an open square lattice of 
tetramers [31 . To allow formation of the square lat- 
tice, monomers are decorated by three patches labeled E 
(edge), S (short arm) and L (long arm), each located on 
the hard-core boundary a distance a/2 from the nearest 
vertex, as shown in Fig. [l] Patches mediate a chemically 
specific internal bond of energy — eint if the E and 
S patches of neighboring monomers approach closer than 
A = a/5 and an external bond of energy — Cext if two 
L patches of neighboring monomers approach closer than 
A. To permit unstructured aggregation, monomers also 
experience a nonspecific attraction of energy —e^k^T if 
their surrounding rectangular forcefields (of width a + 2A 
and length la + 2A) overlap. 

To determine design rules for assembly, we extensively 
varied all three energetic parameters and the packing 
fraction (j). We will discuss the effects of separately vary- 
ing eint and Cext elsewhere. Here, we present results for 
^ext/eint = 2 in terms of a single specific interaction pa- 
rameter eg = Cint = 2eext- We find that the results pre- 
sented here are largely insensitive to the choice of the 
ratio eext/eint- 

We solved model thermodynamics in two ways, as de- 
tailed in the [supplemental methods section| that is avail- 
able online, We used analytic mean- field theory to de- 
termine the thermodynamic driving force for assembly, 
phase boundaries for stable and met ast able phases, and 
reduced second virial coefficients B2 = 52/^2^^^^°^^- 
B2 = (47r)"^ / dri2d0i2 (1 - e'^^^^) is calculated in the 
conventional way, integrating over the phase space of two 
model proteins interacting via the energy [/12. The hard- 
core normalization ^2 is obtained similarly, but for 
a system with no attractive interactions. We calculated 
phase diagrams numerically using direct coexistence and 
Gibbs ensemble simulations [37 . We find that phase di- 



agrams calculated by mean-field theory and simulation 
agree, except that simulation reveals a narrow region of 
thermodynamically stable liquid that the mean-field the- 
ory does not attempt to account for. 

We determined self-assembly dynamics using virtual- 
move Monte Carlo simulations [38] of 1024 monomers at 
constant packing fraction, starting from well-mixed con- 
ditions. Although a truly physical dynamics cannot be 
effected by simulations that do not explicitly represent 
solvent, some important aspects of real overdamped mo- 
tion are retained by this algorithm: bodies move locally 
according to potential energy gradients, and collective 
diffusion constants can be scaled according to cluster size 
and shape. Here, we parameterized the algorithm to en- 
sure that tightly-bound protein clusters of hydrodynamic 
radius R diffuse according to the Stokes solution for the 
overdamped motion of a sphere of radius i?, resulting in 
diffusive behavior for moderate to large clusters that is 
more realistic than that effected by basic Brownian dy- 
namics integrators. Taking a = 3.9 nm, T = 300 K, and 
solution viscosity 77 = 1.00 x 10~^ Pa s, each Monte Carlo 
(MC) cycle corresponds to 2.42 ns. 

Results. We carried out two numerical protocols, each 
designed to mimic a particular experiment. First, for 
three selected 'proteins,' each with a different balance 
of specific and nonspecific interactions, we determined 
where on the conventional temperature-concentration 
phase diagram yield is best. Second, we determined 
the microscopic mechanisms for optimal assembly by in- 
dependently varying specific and nonspecific interaction 
strength. Such a protocol mimics studying a large en- 
semble of related proteins or varying solvent chemistry 
to optimize assembly for a single protein. 

In Fig. [2] (a) we show temperature-concentration phase 
diagrams for three model proteins: specifin, with eg/cn = 
2; intermedin^ with eg/cn = 1.5; and nonspecifin, with 
^s/^n = 1- From protein to protein, the solubility curve 
shifts with interaction specificity more than the liquid- 
vapor binodal, leading to a change of phase diagram 
structure ^ similar to that effected by changing the 
range of attraction of a sphere j3l [39H4T] . Specifin and in- 
termedin display a metastable liquid-vapor coexistence, 
while nonspecifin displays a stable liquid-vapor coexis- 
tence. Intermedin and nonspecifin also display a transi- 
tion from a square lattice {(j) ~ 0.70) to a close-packed 
crystal {(j) ^ 0.76) at high temperature. 

To reveal how well these proteins crystallize, we over- 
lay the phase diagrams with color maps quantifying the 
crystal yield obtained after long dynamic simulations. 
Green indicates high yield; red, low yield. Specifin self- 
assembles best above the liquid- vapor critical point, in- 
termedin assembles best near or just below it, and non- 
specifin crystallizes poorly throughout its phase diagram. 
Below the binodal, monomers generally form kinetically 
sluggish gel-like or microcrystalline clusters that lead to 
poor yield. We illustrate dynamic trajectories leading 




FIG. 2: Optimal yield is not predicted by B2 or the position of the liquid-vapor hinodal. (a) Phase diagrams (temperature 
T = l/cn vs packing fraction cj)) for three model proteins whose interaction specificities decrease from left to right. We overlay 
the analytic solubility curve (solid), which agrees with the numeric data with no adjustable parameters, lines of fixed driving 
force F (dashed), and a color map of square lattice yield (obtained after dynamic simulations of 10^ MC cycles). The position 
of best yield does not track the liquid- vapor binodal (critical points shown as stars) or a fixed slot of B2 (right ticks). Insets 
show phase diagrams on a more conventional linear horizontal scale, (b) Snapshots show example dynamic simulations for 
each protein after 10^ (left) and 5 x 10^ (right) MC cycles for the points on the phase diagrams labeled by open circles. 



to these outcomes in Fig. [2] (b) by showing snapshots at 
early (10^ MC cycles) and late (5 x 10^ MC cycles) times 
for near-optimal conditions for each protein. (See also 
the corresponding online movies Ml] M2l and M3,) 

For this set of proteins, optimal assembly does not 
track the liquid- vapor binodal. Moreover, assembly is not 
predicted by the crystallization slot [5 , which provides 
a necessary but not a sufficient condition for crystalliza- 
tion: good assembly indeed generally occurs in a slot 
— 100 ^ ^2 ^ but peak yield within this slot can 
be highly localized (specifin), or uniformly poor (non- 
specifin). From our analytic theory we calculated lines 
of constant J^, the thermodynamic driving force for crys- 
tallization. The proteins that assemble well do so in the 
window = 1 — 2 /cbT. Our analytic theory demon- 
strates that this window is equivalent to a supersatu- 
ration S = (/)/0gas(^) = 5 — 20, where 0gas(^) is the 
solubility packing fraction. 

We can clarify the molecular dynamical origins of opti- 
mal assembly by surveying an ensemble of related model 
proteins. We do this in Figs. [3] and [4] by indepen- 
dently varying specific and nonspecific interactions at 
fixed packing fraction <p = 0.1. Fig. [3] shows a color map 
of crystal yield overlaid on the phase diagram spanned by 
the two interactions. The surrounding simulation snap- 



shots label the equilibrium phase or coexisting phases 
within each region of the phase diagram. The three pro- 
teins of Fig. [2] lie on the dash-dotted yellow lines. Fig. [4] 
shows 'pathway diagrams' [33] which identify, along self- 
assembly trajectories, the maximum fractions of 'mis- 
bound' proteins (those with their external specific bond 
satisfied but only one of their two internal specific bonds 
satisfied) and nonspecifically aggregated proteins (those 
having no specific bonds and two or more nonspecific 
bonds). 

Optimal yield occurs in the part of the phase diagram 
identified by two conditions: 1) the thermodynamic driv- 
ing force for crystallization must be 1 — 2 /cbT, and 
2) the nonspecific attraction must be as large as possi- 
ble without inducing liquid-vapor phase separation. The 
window of optimal T lies between the weakly supersatu- 
rated region near the solubility curve, where nucleation 
barriers are too high for crystallization to happen in the 
allot ed simulation time, and the strongly supersaturated 
region at large eg, where misbinding predominates (see 
the 'misbound' pathway diagram of Fig. |4|. The ther- 
modynamic driving force is substantially more predictive 
than the second virial coefficient; while most good as- 
sembly occurs with the displayed slot —100 < B2 ^ —5, 
this slot includes large parts of the phase diagram where 
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FIG. 3: i^est assembly occurs at moderate super saturation 
and away from the liquid-vapor demixing line. Phase dia- 
gram and dynamic yield color map for = 0.1 and a range of 
protein specific and nonspecific interactions. Solid (dashed) 
grey curves denote the stable (metastable) boundaries for the 
labeled simulated coexistence combinations. All boundaries 
were calculated using analytic theory, except for the boundary 
between homogeneous and phase- separated monomer fluids; 
this was determined using Gibbs ensemble simulations. The 
dashed black (white) curves denote lines of constant B2 (driv- 
ing force T). Dash-dotted yellow lines represent the proteins 
from Fig. [2] for a fixed protein, temperature increases to the 
left along these lines. 



assembly is poor, and regions in which the target crystal 
is not stable. 

Within the window = 1 — 2 /cbT, yield initially in- 
creases as specific interaction strength is traded for non- 
specific interaction strength [24]. However, this trend 
terminates at the metastable liquid phase boundary. As 
the 'aggregated' pathway diagram of Fig. [4] indicates, this 
phase boundary signals the onset of large-scale nonspe- 
cific aggregation. Density fluctuations associated with 
phase separation therefore conflict with the symmetry 
fluctuations required to stabilize the open crystal lattice. 
This behavior is strikingly distinct from that of isotropic 
spheres, which crystallize best at the metastable critical 
point. However, assembly of model capsomer proteins 
into icosahedral viral capsids [34H36] shares the behavior 
of the present model; there, assembly is also impaired by 
liquid-like aggregation at low interaction specificity [34] . 
We conjecture that for open structures in two and three 
dimensions stabilized by anisotropic attractions, weak 
nonspecific association should aid assembly, but large 
density fluctuations associated with the liquid- vapor crit- 




FIG. 4: Large-scale nonspecific aggregation hinders crystal as- 
sembly. 'Pathway diagrams' show color maps of the maximum 
fractions of misbound and nonspecifically aggregated proteins 
for (j) — 0.1 and a range of protein specific and nonspecific in- 
teractions (see Fig.[3|. Comparison of the yield (Fig. [3| and 
pathway color maps reveals a rule of thumb for optimizing 
assembly: impose the strongest nonspecific interaction that 
does not induce liquid- vapor phase separation. 



ical point should generally impair it. 

Conclusions. Typical protein phase diagrams resem- 
ble those of isotropic colloids bearing short-range attrac- 
tions, but, crucially, they describe different solid struc- 
tures. Here we have shown that the self-assembly of an 
open crystal formed by a model surface-layer protein is 
different in significant ways from the assembly of a close- 
packed crystal. However, it can be rationalized by a set 
of relatively simple design rules. First, the thermody- 
namic driving force for crystallization must be of order 
/cbT. Second, interactions should be adjusted to trade 
specific interaction strength for weak nonspecific associ- 
ation; substantial nonspecific aggregation is deleterious. 

Our results suggest quantitative guidelines for optimiz- 
ing crystal yield in real protein systems. Our window of 
optimal thermodynamic driving force corresponds to a 
supersaturation of 5 to 20. Achieving such a supersatura- 
tion without inducing large-scale nonspecific aggregation 
requires ensuring a large enough 'metastability gap' [42] 
between the solubility curve and the liquid-vapor bin- 
odal. Inspection of the dash-dotted yellow lines in Fig. [3] 
reveals that a protein with no metastability gap (non- 
specifin) or a small metastability gap (intermedin) can 
be transformed into a protein with a large metastabil- 
ity gap (specifin) by increasing the specific interaction 
strength. Such a transformation may be possible for real 
proteins by adjusting solvent chemistry. For instance, re- 
cent experiments suggest that increasing multivalent salt 
concentration can facilitate protein crystal assembly by 
inducing specific contacts between proteins [43] . 
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